Bose-Einstein distribution, condensation transition and multiple stationary states 

in multiloci evolution of diploid populations 



Ginestra Bianconi^ and Olaf Rotzschke^ 

^Physics Department, Northeastern University, Boston, Massachusetts 02115 USA, 
^Singapore Immunology Network (SIgN), Biomedical Sciences Institutes (BMSI), 
Agency for Science, Technology and Research (A*STAR), IMMUNOS, 138648, Singapore, Singapore 

The mapping between genotype and phenotype is encoded in the complex web of epistatic inter- 
action between genetic loci. In this rugged fitness landscape, recombination processes, which tend 
to increase variation in the population, compete with selection processes that tend to reduce genetic 
variation. Here we show that the Bose-Einstein distribution describe the multiple stationary states 
of a diploid population under this multi-loci evolutionary dynamics. Moreover, the evolutionary 
process might undergo an interesting condensation phase transition in the universality class of a 
Bose-Einstein condensation when a finite fraction of pairs of linked loci, is fixed into given allelic 
states. Below this phase transition the genetic variation within a species is significantly reduced and 
only maintained by the remaining polymorphic loci. 
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I. INTRODUCTION 



The deep relation between evolutionary theory and sta- 
tistical mechanics has been fascinating most of the scien- 
tists working in the field. Historically, Fisher compared 
his fundamental theorem of natural selection to the sec- 
ond law of thermodynamics [l| and Kimura, referring to 
the stochasticity of the evolutionary process, has com- 
pared the genetic theory of evolution to the theory of 
gases • More recently, further relations between evolu- 
tionary theory and statistical mechanics have been identi- 
fied H Q • Indeed the relation between chance and neces- 
sity [5| in evolutionary theory, i.e. the trade-off between 
stochastic processes and selection, has the potential to 
be fully described in statistical mechanics terms. Inter- 
estingly, the relation between evolutionary theory and 
quantum statistical mechanics is emerging from a series 
of independent works [6l-[l3j where it has been highlighted 
that a class of phase transitions occurring in evolution 
of haploid population can be described in terms of a 
Bose-Einstein condensation. These transitions are called 
in the biological literature quasispecics transitions and 
represent collective change in the populations when mu- 
tations compete with natural selection. A quasispecics 
transition that can be mapped to a Bose-Einstein con- 
densation, occurs in asexual populations when the muta- 
tion rate is below a critical value where a finite fraction 
of the population is localized in sequence space around 
a given genotype. In this paper we show that the Bose- 
Einstein distribution and condensation transition in the 
Bose-Einstein universality class occur in the evolutionary 
theory of diploid populations when recombination com- 
petes with selection. Therfore we propose that Bose- 
Einstein statistic is the emergent statistics both in evo- 
lution of haploid and diploid populations when there is 
a competition between processes enhancing genetic vari- 
ation in the population (i.e. mutations or recombination 
process) and selection processes (which tend to decrease 



genetic variation in the population). 

The genomic revolution started with the publication 
of the entire sequence of the human genome [3, [l^ has 
made possible the complete analysis of genome variations 
encoded in single nucleotide polymorphisms (SNPs). The 
complete set of SNPs of an individual characterizes to- 
gether with the copy number variations what is unique 
about an individual. Variations in SNPs allelic states (i.e. 
different nucleotide composition of the SNPs) can affect 
how humans develop diseases and how the y respo nd to 
pathogens or drugs. It is well established p^l - l2ll |. that 
genes are integrated in functional pathways and interact 
through complex biological networks. Single nucleotide 
polymorphisms can affect expression or function of the 
genes and they are correlated when gene products are 
part of a joint pathway. If the functional pathways con- 
stitute the phenotype, then the interaction between the 
complete set of SNPs and these pathways encodes for the 
genotype-phenotype mapping. Consequently, SNPs are 
interacting through the bio-molecular networks and their 
contribution to the fitness of an individual is encoded in 
complex epistatic interactions between the SNPs [12, . 

A recent paper has signed a turn-over in the study 
of epistatic interactions 24 1 . Indeed in the epistatic 
network between a large set of pairs of mutations in yeast, 
has been fully characterized. This work demonstrates the 
possibility of collecting data for this new fundamental 
type of biological network (l6l - l2]| in simple organisms, 
by measuring the effect on fitness of pairs of mutations 
in yeast. From the structural point of view, this net- 



work is both modular 21, |25| and fat tail [l6|, |20[ as 
the regulatory network [2l|, the protein interaction net- 
work [26l - [28j and the metabolic network [2^. From the 
evolutionary point of view, this epistatic network sheds 
light at the genotype-phenotype relation and it reveals 
a functional map of the cell in which genes with highly 
correlated profiles delineate specific pathways. Similar 
networks exist also in higher organisms and, importantly, 
a substantial number of genes are regulated on the pop- 
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ulation level by the allelic states of polymorphic loci. A 
functional genome analysis of the signaling pathways of 
human thrombocytes revealed that a striking number of 
genes of the response cascade is under allelic control [13] ■ 

Linkage disequilibrium between SNPs [2^ is a key 
quantity for identifying genes that are related to spe- 
cific diseases. In particular, linkage disequilibrium indi- 
cates the non-random association of alleles at two or more 
loci (SNPs) and is widely observed through the human 
genome [311 • Non random mating of a population and 
variation of the cross-over rate and finite evolutionary 
times contributes to the occurrence of linkage disequi- 
librium in diploid populations. However, also epistatic 
interactions between genetic loci contributes to the ob- 
served linkage disequilibrium. There is compelling ev- 
idence that linkage disequilibrium occurs not only be- 
tween genetic loci in close proximity to the chromosome 
but also between genetic loci at significant distance on 
the same chromosome or even on different chromosomes, 
as summarized in the scheme shown in Fig. [TJ In order to 
explain this phenomenon it is necessary to describe the 
long-distance epistatic interactions between SNPs, which 
are not exclusively weak. 

In order to develop an evolutionary theory in pres- 
ence of epistatic interactions it is necessary to go beyond 
the well defined single locus evolution |32| - [35j . Neverthe- 
less, most of the available mutiloci evolutionary theories 
js^ l are typically limited to a few numbers of loci. A 
relevant exception is the recent paper of R. Neher and 
B. I. Shraiman [l^ where the authors have studied the 
role of the crossover rate in an evolutionary theory of 
a large number of genetic loci, interacting epistatically 
in a network. Interestingly, they found by mean-field 
arguments and by numerical simulations, that the evo- 
lutionary model shows a phase transition responsible for 
the level of genetic variation in a population. In fact, in 
their evolutionary theory, for high recombination rates 
the population is in the "allele selection" phase in which 
genetic loci are only weakly associated, while for low re- 
combination rates the population is in the " genotype se- 
lection" phase consisting of a set of competing genotypes 
locked in given allelic combinations. 

In this paper we study the genetic variability of sex- 
ually reproducing diploid populations where free genetic 
recombination competes with Darwinian natural selec- 
tion under different strengths of the selective pres- 
sure. We consider an epistatic network of N loci (with 

3> 1) of a general topology, and we take the fitness of 
an individual as a function of the allelic states of genetic 
loci in this epistatic network. In order to approach the 
formidable task to tackle the complexity of a mutiloci 
evolution, we neglect mutations (that further contribute 
to the genetic variation in the population) and we assume 
free recombination and infinite population and time limit. 
Finally, by making an ansatz on the shape of the distri- 
bution of gametes allelic state in the population, we are 
able to characterize all the stationary states in linkage 
equilibrium (while we leave to subsequent publications 



the study of solutions compatible with linkage disequi- 
librium) . The technical improvement with respect to the 
previous mutiloci theories that makes our theory exactly 
solvable, is due to the advantages of defining the fitness 
function over an epistatic network and using the most re- 
cent developments of statistical mechanics [l3: 137l - l40j . In 
particular in this paper we make use of a self consistent 
argument [l^l combined with the cavity method (STi - lioj . 

The stationary states are multiple and therefore, 
asymptotically in time, the state of a population depends 
on the initial conditions. Unexpectedly, the joint fre- 
quency of allelic states of pairs of linked loci, at stationar- 
ity, is expressed in terms of a Bose-Einstein distribution. 
In a quantum Bose gas, the Bose-Einstein distribution 
describes the occupation number of energy levels. More- 
over, a quantum Bose gas, below a critical temperature, 
might undergo a Bose-Einstein condensation transition 
in which a finite fraction of particles arc found in the 
ground state. In our mutiloci evolution dynamics for 
diploid populations, the relation of the steady state so- 
lution with the Bose-Einstein distribution, allows us to 
predict a condensation transition in the Bose-Einstein 
universality class [4l|, |42|. In this evolutionary model, a 
pair of genetic loci is in the "ground state" when they 
are fixed, i.e. when they are in a given allelic state (not 
necessarily the same allelic state in each genetic locus) 
and they are not any more polymorphic. For a given 
value of the selective pressure, and a suitable topology of 
the epistatic network that allows for a condensate phase, 
a finite fraction of pairs of loci is fixed. Therefore a fi- 
nite fraction of pairs of loci is not any more polymor- 
phic and the number of polymorphic loci is significantly 
reduced. The basic mechanism behind the studied con- 
densation of genetic loci, is a cooperative phenomenon in 
which, as the selection pressure increases, one locus that 
is fixed in a beneficial configuration, affects the other 
linked loci inducing them to fix in given allelic config- 
urations, generating a macroscopic phase transition. A 
similar phenomenon is also known in the two-loci evolu- 
tionary dynamics and is called in the literature "genetic 
hitchhiking" [i^ . We note here that the phase transition 
observed in our model should be distinguished from the 
" genotype selection" of [ll] because the genotypes in the 
population maintain a significant genetic variation due 
to the remaining polymorphic loci which have not been 
fixed. Interestingly, another phase transition between a 
polymorphic phase and a frozen phase has been numeri- 
cally observed at a critical value of the mutation rate in 
sexual populations [4^ . 

The paper is divided as follows: in Sec. II we define the 
fitness function that drives the evolutionary dynamics in 
presence of a complex epistatic network; in Sec. Ill we 
define the evolutionary dynamic equation under consid- 
eration; in Sec. IV we highlight the results regarding the 
steady state population of the evolutionary dynamics, in- 
cluding all the details of the derivations in the subsequent 
appendixes; and finally we give the conclusions. 
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FIG. 1: (Color online) Linkage disequilibrium (LD) between 
clusters of SNPs. Single nucleotide polymorphisms are usually 
arranged in small clusters with all members being in complete 
linkage (haplotype). SNPs epistatic interactions are mediated 
by the transcription factors (TF) binding on regulatory re- 
gions and by the genes interacting in regulatory and signaling 
networks. Linkage disequilibrium between clusters of single 
nucleotide polymorphism (SNPs) can either appear when two 
clusters are in close proximity along a chromosome (as for ex- 
ample LDi2 in the figure) or when the SNPs clusters are at 
a significant distance along the chromosome or even in two 
different chromosomes (as for example LD13 in the figure). 
While different recombination rates might explain part of the 
linkage disequilibrium for clusters of SNPs in proximity along 
the chromosome, only epistatic interactions can explain link- 
age disequilibrium for distant clusters. 



II. THE FITNESS FUNCTION AND THE 
EPISTATIC NETWORK 



Haploid cells have a single copy of each chromosome. 
On the contrary, diploid cells have two homologous copies 
of each chromosome (see Fig. [5]). Usually the genome 
of each diploid individual is given by the pairs of chro- 
mosomes A and B of the two haploid gametes coming 
from the father and from the mother of the individual. 
Let us suppose that each gamete is identified by N loci 
indicated with latin letters i = 1,2, . . . , N . If we in- 
dicate with Xi the allelic state at each locus i, the ga- 
mete is characterized by the sequence {xi}i^i^2,...,N with 
Xi = 1,2, ... ,Q and Q given by the biochemistry of the 
DNA, i.e. (3 = 4 indicating respectively the pair of or- 
dered nucleotides AT (Adenine, Thymine), TA (thymine, 
adenine), CG (cytosinc, guanine) or GC (guanine, cyto- 
sine) in the double stranded DNA. Given this descrip- 
tion of the gametes, each individual is characterized by 
the sequence {xf, xf}i^i^2,...,N where xf^^ indicates the 
allelic states in each parental gametes A/B. In the mul- 
tiplicative non-epistatic (NE) scenario the fitness func- 
tion W^^{{x'^,x^}) factorizes into contributions com- 
ing from independent single loci, i.e. allelic states in a 
pair of loci do not have epistatic interactions. In this 
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FIG. 2: (Color online) Difference between haploid and diploid 
cells: (a) while haploid cells have a single copy of each chro- 
mosome, diploid (b) organisms have two homologous copies 
of each chromosome. 



traditional framework the fitness function is written as 

w^^{{x\x^})=l[^pri^f,^f)- (1) 

i 

In the free recombination hypothesis, the minimal modifi- 
cation to this theory that is compatible with the presence 
of epistatic interactions is that the fitness is a function of 
allelic states of pairs of loci. Therefore we assume that 
the loci i = 1,2 . . . , N are linked in an epistatic network 
formed by L links. We define by di the set of loci j linked 
with locus i in the network. The epistatic couplings be- 
tween pairs of loci have a role in determining the fitness 
function that can be modified with respect to the single 
locus expression ^ , according to the expression 



W{{x^,x^})^ n q^,,ixf,xf,xf,xf), 



(2) 



<ij> 



where the product is extended to all genetic loci < i,j > 
linked in the epistatic network. Therefore, the fitness 
function in Eq. ([2]) is the first non trivial correction to 
P]) and includes a product of contributions coming from 
pairs of diploid allelic states at different loci. 



We parametrize the functions (jytj {xf 
the following 



(f>ij{xi 



B „B 



-l3U,j(xf,xf,xf,xf) 



) as in 



(3) 



where the parameter (3 indicates the selective pressure. In 
fact for /3 = we recover a neutral theory while for large j3 



small variations of the function Uij(x^ , x, , x.^ , x 

) yield 



large changes in the fitness. 
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Furthermore the function 
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) has the following symmetries 
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FIG. 3: (Color online) Graphical representation of the trans- 
fer of genetic information in overlapping generations called in 
biology the gametic life cycle. This diagram describes two 
parental gametes A, B (solid lines) that give rise to a zygote 
and then to an individual AB (dashed line) by a fertilization 
process indicated with ©. This individual on his or her turn 
generates n = 3 during meiosis g) new gametes 5*1 , 52 , S3 . 




with the last relation valid only if we assume perfect sym- 
metry between the parental gametes, i.e. if we exclude 
to study the sex chromosomes X, Y . The fitness land- 
scape defined in Eq. ^ corresponds to a disordered 
Potts Hamiltonian and therefore it is in general char- 
acterized by many local maxima. 



III. EVOLUTIONARY DYNAMICS 



FIG. 4: (Color online) Graphical representation of meiosis. 
Starting from a given diploid genome (a) of an individual 
AB, as the outcome of replication (b), cross-over and genetic 
recombination (c) a set of haploid gametes 81,82, ■ ■ . 8n are 
formed (d). An exchange of genetic information might occur 
at each recombination site. In particular, every pair of genetic 
loci of the gamete either comse from a single parent or from 
different parents. 



The evolutionary dynamics of diploid populations de- 
scribes the information transfer of genetic information 
encoded in the gamete sequences. Each individual of a 
diploid population is carrying the information encoded 
in the gametes of their parents indicated as A/B. The 
evolution is due to the transmission of each individual to 
the next generation of new gametes which arc a random 
recombination of the information encoded in parental ga- 
metes A/B. In physical terms the process can be seen as 
a "scattering" process in which two gametes A/B gen- 
erate a new individual (fertilization) and the new indi- 
vidual, if it reaches the reproductive state, carries the 
information and gives rise (by meiosis) to new gametes 
Si, S2, ■ ■ ■ Sn with n ~ 0, 1, 2 . . .. We can visualize this 
process also called gametic life cycle by the diagram 
shown in Fig. |3] in which each solid line is a gamete and 
each dotted line is an individual. The vertices of this dia- 
gram are indicated with a sign © when fertilization occurs 
or with a sign ® when meiosis occurs, i.e. a new gamete 
is generated by a process of recombination of the diploid 
genetic information. The presence of these vertices of the 
diagram is an exclusive characteristics of diploid organ- 
isms wherever in haploid organism the single individual 
of the population is transmitting the genomic informa- 



tion to the next generation. The process of meiosis is a 
process of reduction in the genetic information of each 
diploid individual to generate gametes which have only 
half of the number of chromosomes. During meiosis (see 
Fig. 31) a process of recombination can occur with small 
probability at given locations (recombination hotspots) 
on the chromosome. When a recombination event oc- 
curs, homologous sites on two chromosomes can mesh 
with one another and may exchange genetic information. 
In our evolutionary dynamics we take the infinite popu- 
lation limit and we assume that at each genetic locus a 
recombination event can take place. Therefore, the prob- 
ability P{{x}) that a gamete has an allelic configuration 
{.t}, satisfies the following dynamical equation 



dt 



M. 



{x} 



W{{x^, xB})P{{x^})P{{xB}) 



{W) 



where VF({a 



diploid allelic configuration {a 



-P({x}) 
(5) 

'}) is the fitness of the individual of 



and ^ and where {W) is the average fitness 



given by Eqs. ^ 



{W) 



J2 W{{x^,x^})P{{x^})Pi{x^}). (6) 
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The operator M^^y introduced in Eq. (O indicates the 
free recombination of genetic material occurring when 
a new gamete is generated. In particular the operator 
Mjj,} is defined as the average over the probability of 
free recombination U{{x}\{x^,x^}), i.e. the action of 
the operator over a generic function /({x"^, x^}) is given 
by 

M{.j [f{{x^,x^})] = J2 n({x}|{x^,x^})/({.T^,x^}) 

{x^},{xB} 

(7) 

with 



N 



1=1 



^6{xi,xf) + ^S{x,,xf) 



(8) 



We note here that in this model we assume free recom- 
bination and equivalence between the parent gametes. 
Moreover, in order to simplify the treatment of the evolu- 
tionary model, we limit our study to evolution of diploid 
populations in absence of mutations. 



is a subset of the general type of solutions ([9]) . In partic- 
ular, in order for ([TT|) to be a solution of the stationary 
relation Eq. (fTO|) . P{{x}) must satisfy the condition 

P{{x^})Pi{x^}) = J, xf J, xf ) (12) 

where {x^j} indicates all the variables {x^} except vari- 
able {xf} and {x^^} indicates all the variables {x^} ex- 
cept variable {xf}. It can be easily shown that these con- 
ditions, enforce linkage equilibrium between allelic states. 
In this paper we restrict our attention to this type of so- 
lutions and we leave to subsequent publications the study 
of stationary state distributions compatible with linkage 
disequilibrium. 

The marginal frequencies pij{xi, Xj) of allelic states on 
pairs of linked loci are defined as 



Pij (Xj , Xj ) 



J2Pi{x})5ix, 

{x} 



x'^S{xj,x'A. 



(13) 



B. Bose-Einstein distribution 



IV. RESULTS 



General form of steady state probability 
distributions 



If the network is locally tree-like, the general structure 
of the solution to the evolutionary equation ([5]) is given 

by 



P({x})=^7r(/.) n b^M^^j)- 



(9) 



<i,3> 



where < i,j > indicates all the pairs of linked nodes 
present in the cpistatic network. 

In our model the fitness function of the type ([2]) and 
(|3|) , for any given epistatic network, is static and bounded 
from above, therefore we always expect to find asymptot- 
ically in time the population in a stationary state given 
by the solution of to the equation 



P{{x}) = 



W{{x'^, x^})P{{x^})P{{xB}) 
{W) 



(10) 

These stationary states, do not necessarily correspond to 
a maximum of the fitness . In the case treated in this 
paper, in which the epistatic network is fixed and locally 
tree-like, we can find, for every generic fitness function 
of type ([5]) and ([3]) , the possible stationary states of the 
population (see appendixes) of the type 



P{{x}) = n b^jix^,XJ), 



(11) 



where the product is extended to all genetic loci < i,j > 
linked in the epistatic network. The type of solutions PT|) 



In order to find all the stationary solutions solving Eq. 
([TU| of the form given by pT|) , we used a self-consistent 
argument [l3l combined with the cavity method [13, IH, 
|40|. In particular we find that in the stationary state, 
not necessary a maximum of the fitness function [45j . 
the marginal frequencies pij{xi,Xj) defined in Ec^. (|13p 
are given by (see Appendix [Blmc [C|) 



Pij{xi,Xj) = —Gij{xi,Xj) {1 + nB{eij{xi,Xj)]} (14) 

if ey (xi,Xj) > /i. The functions eij{xi,Xj),Gij{xi,Xj) in 
Eq. ([Tl]) and the constants /x, Z can be derived from the 
self consistent solution of the stationary state of the evo- 
lutionary dynamics described by Eq. ([5]) (see Appendix 
IB|) . In Eq. ([Ti|) nB(e) indicates the Bose-Einstein occu- 
pation number and fi indicates the "chemical potential" 
of the evolutionary dynamics. The Bose-Einstein occu- 
pation number is defined as 



nB[eijixi,x.j)] 



1 



oPlUj{xi,Xj)-fJ.] _ I ' 



(15) 



Equation (|14p relates the joint probability of pair of 
linked loci with a Bose-Einstein distribution arising in 
the study of quantum Bose gases [4l|, Here the 

functions, tij(xi,Xj) play the role of "energy states" 
of this Bose-Einstein distribution. These functions are 
not known a priori but they are the outcome of the 
evolutionary dynamics. A relevant aspect of this solution 
is that we might find several different sets of functions 
{tij(xi,Xj),Gij{xi,Xj)} and variables Z, /i that satisfy 
the stationary condition of the evolutionary dynamics. 
These different solutions have to be identified with 
different possible populations of a given species. In fact, 
given different initial conditions the population evolving 
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FIG. 5: (Color online) Numerical evidence for the condensa- 
tion phase transition. The fraction i/q of fixed pairs of loci is 
plotted as a function of the selective pressure /3. The data are 
averaged over 50 random fitness realizations and are shown 
for a epistatic network with degree distribution given by Eq. 
(|21|l . The fitness function used is given by Eqs. ((2]) and (|3]) 
with the matrix elements Uijixf' ,xf ,xf ,xf) satisfying the 
symmetry constraints @ and drawn randomly from a uni- 
form distribution in the interval (0,1). The data are shown 
for a number of genetic loci with N = 100, 200, 300. 

according to Eq. ([S]) can be found, asymptotically in 
time, in different stationary states. According to the 
general mutiloci evolutionary scenario [i^, these steady 
states do not in general correspond to local maxima of 
the fitness landscape. 

Interestingly, the marginal frequencies pij{xi,Xj) are 
significantly modified if a given pair of allelic configura- 
tion {x*,x*) reaches the minimal allowed "energy level" 
eij{x*,x*) = /i. In this case we found Gij{xi,Xj) = for 
every allelic state {xi , xj ) and the pairs of linked loci (i, j) 
gets fixed (see Appendix iPj). Therefore if eij{x*, x*) = fi, 
the joint probability pij{xi,Xj) is given by 

Pij{xi,Xj) = S{xi,x*)S{xj,x*). (16) 

To be specific in the terminology used, here and in the 
following we assume that a pair of genetic loci is fixed 
if and only the joint distribution is given by (|16p. i.e. if 
and only if both genetic loci {i and j) are fixed. 

C. Condensation transition 

The joint distributions ((T?)) and must always sat- 
isfy the normalization constraints 

1 = P^A^^^^]) (17) 

valid for every pair of linked loci (i, j). This set of equa- 
tions plays the role of " conservation" laws for the evolu- 
tionary dynamics and determines the phase diagram of 



the evolutionary dynamics. Inserting in (flT)) the expres- 
sion ([H]) for pij{xi,Xj) when eij{xi,Xj) < /i and expres- 
sion for pij{xi, Xj) when eij{xi,Xj) = fiwe can write 
the normalization conditions (|17p as in the following, 

= I E (18) 

Xi,Xj 

— ^ Gij {xi , Xj)nB [ttj {xi ,Xj)] 

valid for every pair of linked loci (i, j). In Eq. pSI) . nsii) 
is the Bosc-Einstcin distribution defined in Eq. (|15p and 
the matrix elements Vij are Vij = 1 the pairs of linked loci 
(«,j) are fixed, otherwise Vij = 0. Considering Eq. 
and averaging over all pairs of linked loci we can write 

{l-vo) = \[ de[l + nB{e)]gp{e) (19) 

where nsie) is the Bose-Einstcin distribution defined in 
Eq. (fT5|) . and i/q is the fraction of pairs of linked loci that 
are fixed in the population. The quantities 5/3 (e) and vq 
present in Eq. (|19p are given by 

gp{e) = 2™^^^ E 'Y Gvi^^:^3)XA4<^^J{x^,Xj) - e) 

^0 = 2Z ^ E (2^^) 

where X5{x) = \ li \x\ < 5 and xs{x) = otherwise. 
Depending on the form of (e) the solution of Eq. 
might indicate the occurrence, as a function of /?, of a 
condensation phase transition characterized by the order 
parameter vq. For epistatic network topologies and fit- 
ness functions which display this phase transition, we can 
distinguish, as a function of /3, between a noncondensed 
phase in which the fraction of fixed pairs of loci van- 
ishes in the thermodynamic limit, i.e. i/q — > as — oo 
and a "condensed phase" in which the fraction of fixed 
pairs of genetic loci is finite in the thermodynamic limit, 
i.e. fo — > I' > as cxo. 

This condensation phase transition is in the same uni- 
versality class of the Bose-Einstein condensation transi- 
tion as it depends on the value of the integral of a Bose- 
Einstein distribution present in Eq. p9)) . We observe, 
nevertheless, that Eq. p^ differs from the equation fix- 
ing the average number of particles in a Bose gas (ill, l42l | 
because the function gp{t) given by Eq. (|20p depends on 
j3 while the correspondent density of states in a quantum 
Bose gas is independent of /?. Moreover in Eq. ([20|) there 
is an additional factor Z in the right hand side with re- 
spect to the correspondent equation in the quantum Bose 
gas. 

In the noncondensed phase all genetic loci arc poly- 
morphic, on the contrary, in the "condensed phase" only 
a fraction of genetic loci is polymorphic. In which phase 



7 



arc diploid populations usually found? If wc assume that 
each base of the DNA is a candidate SNP, we observe that 
polymorphisms only occur in a finite fraction of bases. 
For example, in the human genome less than 1% of the 
bases corresponds to SNPs. Here we propose that the 
condensation of genetic loci due to epistatic interactions, 
might significantly contribute to the reduction in genetic 
variation within a species. 

D. Numerical evidence of the condensation 

transition 

While the results exposed in the preceding section 
are valid for any fitness function of type ^ and tree- 
like epistatic network, the actual phase diagram of the 
evolutionary dynamics might change depending on the 
topology of the network and on the detail of the fitness 
function. In this paragraph we show numerical evidence 
for the condensation of genetic loci by solving the self- 
consistent equations (see Appendix [B]) that determine 
{eij{xi,Xj),Gij{xi,Xj)} and vq, for a given fitness func- 
tion, starting from random initial conditions. In par- 
ticular we consider a network topology that allows for 
long-distance epistatic interactions (see Fig. [1]). We have 
therefore chosen to study an epistatic network with de- 
gree distribution 

P(k)o^k-^. (21) 

In Fig. [S] wc show evidence for the occurrence of the 
condensation transition of genetic loci when the epistatic 
network is a random network with degree distribution 
given by (PT|) and 7 = 3. In particular in Fig. [S]we have 
plotted the fraction of fixed pairs of loci (averaged over 
several random realizations of the fitness function) as a 
function of the evolutionary pressure /3. 

The "condensed phase" is defined as the region where 
I'o is large and docs not show finite size effects. Outside 
this region, instead, we have the "non condensed phase" 
where the fraction of fixed loci goes to zero in the limit 
of large iV, i.e. I'o — > as iV — >■ oo. The fitness func- 
tion used in the numerical solution reported in Fig. [SJ 
is given by Eqs. ([2]) and ([3]) with the matrix elements 
Uij{xf,x^,xf,x^) satisfying the symmetry constraints 
and drawn randomly from a uniform distribution in 
the interval (0,1). Finally, in order to reduce the time 
for the numerical solution of the self-consistent equations 
we have taken Q = 2. 

E. Condensation transitions in evolutionary 

dynamics 

Condensation phase transitions universally occur in 
evolutionary models. A pivotal condensation phase tran- 
sition occurs in the quasispecies evolutionary model 
of haploid populations that describes the competition be- 
tween random mutations, which tend to increase the ge- 



netic variation of the population, and natural selection 
which tends to reduce it. In the quasispecies model, 
when the mutation rate fi is less than a critical value 
/ic, i.e. fj, < fic, the haploid population is localized in the 
sequence space, and when, instead, fi > fic there is no 
possibility to define a typical sequence in the population. 
A condensation transition also occurs in the "house of 
cards" model of Kingman Q which describes the quasis- 
pecies model in the limit of infinite loci. Kingman char- 
acterized the condensation transition in the "house of 
cards" model but only recently, with the study of evolv- 
ing complex systems, i.e networks [l^l and ecosystems 
12 1 , and in a more elaborated model with pleiotropy [ll| 
it was recognized that this condensation can be mapped 
to a Bosc-Einstein condensation in a Rose gas. 

Condensation phase transitions also occur in diploid 
populations. In |13| it was shown that the phase transi- 
tion between the "allele selection" phase and the "geno- 
type selection" phase is a condensation phase transition 
below which, for low recombination rates, few genotypes 
are selected in the population. 

Here we show that a condensation transition in the 
Bose-Einstein universality class is also occurring in 
diploid populations in the presence of free genetic re- 
combination. The novelty of this transition is that the 
occurrence of the Bose-Einstein statistics is not caused 
by mutations (as it is the case for the quasispecies and 
the "house of cards" models) but only by genetic recom- 
bination. Moreover, this condensation transition differs 
from the transition between "allelic selection" and "geno- 
type selection" of [l3| because in the condensed phase 
of the present mutiloci evolutionary theory, the popu- 
lation maintains a wide variation although the number 
of polymorphic loci is significantly reduced. Finally, the 
condensation of genetic loci of diploid populations is a 
consequence of the non-trivial interactions of genetic loci 
in the epistatic network while in the quasispecies model 
and in the "house of cards" model the interactions be- 
tween the individuals of the population are only mediated 
by the competition for finite resources. Therefore the 
condensation of genetic loci in the present evolutionary 
theory relates to the condensation transition in the qua- 
sispecies model as the condensation transition in in- 
teracting quantum Bose gases [l^] relates to the conden- 
sation transition in non-interacting quantum Bose gases 
j4ll . Finally, it is fascinating to observe how different 
are the underlying mechanisms yielding to condensation 
transitions in haploid and diploid populations while both 
mechanisms have been selected by nature for their evo- 
lutionary advantages. 



V. CONCLUSIONS 

In conclusion we have studied a mutiloci evolutionary 
dynamics in sexually reproducing diploid populations in 
which random genetic recombination tends to increase 
genetic variation while natural selection tends to reduce 
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it. The mutiloci evolution is driven by a fitness func- 
tion defined on an epistatic network of genetic loci. We 
have found that the stationary states of this evolutionary 
dynamics are multiple, and depend on the initial con- 
dition of the population. Unexpectedly, we have found 
that the joint distributions of allelic states at linked loci, 
can, at stationary state, be expressed in terms of a Bose- 
Einstein distribution with the "energy levels" depend- 
ing on the network of epistatic interactions between ge- 
netic loci. The relation of the joint distributions with the 
Bose-Einstein distribution allows us to define a possible 
condensation phase transition in the universality class of 
the Bose-Einstein condensation. Below this condensation 
phase transition a finite fraction of pairs of genetic loci 
is fixed in the population and the number of polymor- 
phic loci is strongly reduced. Therefore we propose here 
the Bose-Einstein condensation of genetic loci as a pos- 
sible mechanism contributing to the reduction in genetic 
variation within a species. 

In the future it is promising to include in this model 
the role of mutations (that increase genetic variation in 
the population), finite populations (that contribute to 
the existence of linkage disequilibrium) and the adaptive 
nature of the epistatic network. Moreover, we plan in 
future works to include in the model the possibility for a 
variable crossover rate and to go beyond the assumption 
of a locally tree-like epistatic network. Finally it would 
be interesting to characterize further the relation between 
the evolutionary dynamics and quantum mechanics [46l . 
l47j by investigating the role of condensation transitions 
present in evolution (6l-[l3| belonging to the Bose-Einstein 
universality class. 



Appendix A: Calculations of marginals using the 
cavity method (Bethe-Peierls approximation) 

In the hypothesis that the epistatic network is locally 
tree-like we look for solution to the evolutionary equation 
(O of the form given by 



(Al) 



<i,j> 



with bij{xi,Xj) to be determined by Eq. ([5|). The 
marginal frequency pij{xi,Xj) of a pair of linked loci is 
defined as 

Pvixi,x'j) = Y,P{{^})S{x^,x'M^„x'^). (A2) 

{x} 

If wc assume to know tliG functions biji^Xi^xj ) the 
marginal frequencies (|A2[) can be calculated by the cav- 
ity method j37i440| (or the Bethe Peierls approximation) 
exactly valid on locally tree-like networks. For complete- 
ness we describe here the fundamentals of the cavity 
method that will be used in the following derivation of 
the stationary state solution of the evolutionary dynam- 
ics defined in ([5]). Let us work in the hypothesis that 



the network is really a tree. Consider a known distri- 
bution function of the type (jAip with given functions 
bij{xi, Xj). Our purpose is to calculate in an efficient 
way the marginals Py(xi, Xj) defined in (|13p . These dis- 
tributions can be calculated by a simple iterative proce- 
dure. Let us introduce the restricted partition functions 
Zj\i{xj) of the sub-trees 7j|i rooted in the node j G di 
and not including node i. These subtrees are marked by 
dashed lines in Fig. [6l The restricted partition functions 
Zj\iixj) is defined as 

= X! n be-e'i^e^^e')^ (A3) 

{Xi} '^''^'ST;-!. 

where the sum is performed over all the variables xi as- 
sociated with the nodes i of the sub-tree Tj\i except for 
the variable xj . Using this definition and the assumption 
that the network is locally a tree (see Fig. [7]), it is easy to 
prove that the marginal distributions pij (xt , xj ) defined 
in (jl3|) are given by 



Piji^Xi^Xj^ — bij {^X-i ^ X Z j^j^(^Xj^ (^Xi^ . 



(A4) 



In order to calculate the restricted partition functions 
Zj\j^{xj) we use the following recursive equation, that ex- 
presses the relation between restricted partition functions 
of nested subtrees, 

■^jli(^j) = n ^K^j'^k)Zk\j{Xk)- (A5) 

These recursive equations arc sufficient to define the full 
set of restricted partition functions Zj^i{xj) within a con- 
stant that must be fixed by the normalization conditions 



(A6) 



The cavity method is proved to be exact not only on 
trees but also on locally tree-like networks. Nevertheless 
it also generally used for networks with short loops as 
long as the recursive equations (|A5|) have a solution. We 
can extend this formalism to generic distributions defined 
on locally tree-like networks in which each node is associ- 
ated to more than one variable. Let us for example con- 
sider the case of the distribution function V{{x^,x^}) 
defined in terms of the distribution P({x}) and is given 
by 



Vi{x'',x^}) 



Wi{x^, x^})P{{x^})P{{x^}) 
(W) 



(A7) 

Also this distribution function, like the distribution 
P{{x}), is defined on a tree, but in this case to each 
node i, where two variables are associated: xf and xf 
. Assuming that the distribution P{{x}) is given by Eq. 
(jAip . when the functions bij(xi,Xj) are known, we can 
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FIG. 6: Iteration tree for the calculation of the partition func- 
tion. The sub-trees rooted in all j £ di are marked with 
dashed lines 



Finally the marginals rriij {xi,Xj,x^,x'j) of the probability 
distribution V{{x^,x^}) are defined as 



ii^i, Xj , Xj^, Xj) — ^ ^ I I'^i) 

{x^}.,{xB} 

-K5{xf,x[)5{xf,x,)5{xf,xr) (AlO) 



and arc given in terms of the restricted partition func- 
tions according to the following relation 

mij{x^ ,x^ ,Xj ,Xj ) = — e P t ^ 3 ■ ^ ' i ^bij{x^ ,Xj ) 

(All) 

where the normalization constant Z can be calculated by 
imposing the normalization conditions 

J2 m,,{xf,xf,xf,xf)^l. (A12) 

xf 




\ b,.(x.x) / 



I J I 




FIG. 7: The calculation of the marginal distribution 
Pij{xi,Xj) can be expressed in terms on the restricted par- 
tition functions of the trees 7I|j and 7i|j according to Eq. 

"a. 



write the distribution V{{x^, x^}) as a product of terms 
depending on indeees (i, j) of linked pair of nodes accord- 
ing to the expression, 



n{x^,x^}) = n 



X 

<i,j> 



{W) 



n h,ixf,xf)h,{xf,xf).{A8) 



Therefore, proceeding as in the previous case, we can 
use the cavity method and define the restricted partition 
hmctions Zj|j(a;^, a;^) defined on the subtrees Tj\i and 
determined within a constant by the recursive equations 

k£dj\i x-f^ ,xj^ 

X {xf ,xt)b,k{xf,x^)Z„\jixi,x^).iA9) 



Appendix B: Characterization of the steady state 
solution 



The stationary states of Eq. ([5]) are given by the solu- 
tions to the equation 



P{{x}) = Af{,| 



W{{x^, x^})P{{x^})P{{xB}) 
{W) 



(Bl) 

If the network of epistatic interactions is locally tree- 
like, we can find the exact solution of Eq. (|Bip using 
a self-consistent argument [lol | combined with the cavity 
method [l^-S^- 

In our self-consistent assumption we suppose to know 
the functions bij {xi , Xj ) determining the distribution 
P{{x}) given by Eq. 1{M\ and the distribution x^) 
defined in Eqs. (|A7p and (jASp . both present in Eq. 
(jBip . If we suppose to know the functions bij{xi,Xj) we 
can evaluate the marginal distributions pij (xi , Xj ) and 
mij{xi,Xj,x[,x'j) by the cavity method as described in 
the previous section. Finally imposing the stability con- 
dition (|B1[) . on the marginal distributions, taking into 
account for the free recombination operator M^^} we get 

bij{xi,Xj)Zi\j{xi)Zj\i[xj) = — mij{xi,Xj,x^ '^j) 

x^ ,x^ 

+ ^ E mtj{xi,xf,xf,Xj) (B2) 



The first term in the right hand side of Eq. (jB2p comes 
from the probability that both allelic states i and j derive 
from a single parent. The second term in the right hand 
side of Eq. (jB2p . instead, describes the probability of a 
cross-over of genetic information, i.e. the event that the 
two allelic states of the new gamete originate from 
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different parental gametes. In Eq. (|B2I) we have used the 
fact that the marginals mij{xi, Xj, x^, Xj) are symmetric, 



I.e. 



as a consequence of the assumed symmetries of the fitness 
function given by Using Eq. (|JTT1) to express the 
marginals, explicitly taking into account of the depen- 
dence of the right hand side of Eq. (|B2p on bij{xi,Xj), 
we can write Eq. (|B2p as in the following 

Zhij{xi^ X j') Zj^^ji^X^^Z j^j^ix — bij {Xi ^ Xj^ Fij {^X-i ^ Xj^ 

+Gij{xi,Xj). (B4) 
where the functions Fij [xi, xj), Gij {xi, xj) are defined as 

^ , 1 ^ 1 , B B \ 

2 BB 



1 1 



-JTlij (^Xi , Xj , Xi , Xj ) , 



2 bij , Xj ) 
Gij {xi , Xj ) = — y ^ niij {Xi , .T^- , X^ , .T j ) 



(B5) 



X [\-5{x,,xf )5{xf,Xj)\ . 

These functions can be calculated by the cavity method 
in terms of the restricted partition functions Zi\j{xi) sat- 
isfying Eq. (|A9p as expressed by the following equations. 



Fij{xi,Xj) — 2 ^ ] e ^ *j( " J' I ' J '^bij{x^ ,Xj ) 
x2j^i(xj , Xj )Z.i\j{xi,Xj^ ) -|- 

X Zjyi [Xj , Xj (.Ti , Xi) 

Gij{xi,Xj) = — e '^'^'^t " J • ' ' ^'^bij{xi,Xj ) 

X* ,x^ 

xhijix^ , Xj)Zj^i(xj , (xi , Xj ) X 

X [l-5(x„xf)J(x/,Xj)] , 
Z = ^ 6.y(xi,Xj)i^ij(xi,Xj) -l-G'ij(xi,Xj). (B6) 



The Equations ()B4p , can be seen as a set of equations able 
to determine self-consistently the functions bij{xi,Xj) 
closing the self-consistent argument. The coupled equa- 
tions (jXSl) . dMJ , dMI, (fB4|) and (fBej) provide the solution 
for the stationary state of the mutiloci evolution. These 
cavity equations will in general lead to multiple solutions 
corresponding to the multiplicity of possible steady states 
of the studied evolutionary dynamics. 



Appendix C: Bose-Einstein distribution 

We want here to comment on the structure of the sta- 
tionary distribution found by the solution of the mutiloci 



evolution provided in the previous section. Solving Eqs. 
(|B4|) for bij{xi,Xj) yields 



bij {Xi, Xj) 



Gij {Xi , Xj ) 



Z Zi\j(xi)Z j^iixj) — Fij(xi, Xj) 



(CI) 



as long as \_Z Zi\j{xi)Zj\i{xj) — i^jj(xi, x^)] > 0. Let us 
for the moment assume that this last condition is always 
satisfied and relate to Appendix |D] for the study of the 
solution when the mentioned condition is not met. We 
observe that the probability pij [xi , Xj ) is given by 



Pij (j^i : ^j ) 



Gij {Xi , Xj)Zi^j {xi)Z j^i (Xj J 



^ ^i\j{-^i)^j\i (-^j ) Fij (xi , Xj ) 

The stationary solution (jCl|) can be also written as 

Gij (Xj , Xj ) I Fij [xi , Xj ) 



(C2) 



^ij ij^i 1 Xj ) 



(C3) 



valid for Ze'^»j(^»'^j' — 1 > 0, where eij{xi,Xj) and are 
defined as 



Cij (Xj , Xj ) /i 



1 



ZZi\j{xi)Zjyi{xj) 

Fij {Xi , Xj ) 



(C4) 



■In 



Using the relation (|A4)) and the Eq.s ((CT]) - ((C3)) . we 
derive the marginal probability (xi , Xj), of linked pairs 
of loci (i, j), 

Pij is^ii-^j^ — bij (Xj , Xj" ^ Zi^j{Xi^ Z j\^i (Xj ) 

= ^Gij{xi,Xj){l + nB[eij{xi,Xj)]}{C5) 

with nB[€ij{xi, Xj)] indicating the Bose distribution (4l| 
associated with "energy level" eij{xi,Xj), i.e. 



nB[eijixi,Xj)] 



1 



gl3[eij{xi,Xj)-p,] _ I ■ 



(C6) 



Since the distribution of P({x}) is normalized, fi must 
satisfy, for every pair of linked loci (i, j), the normaliza- 
tion condition 



1 



^ ] Pij (-^j I ■^i ) ■ 



(C7) 



Using the expression (jCSP for the marginal distribution 
Pij{xi,Xj) we arrive at a set of equations, 

1 = ^ ^ G^j(xi,Xj) + ^Y1 G^j{x^,XJ)nB[e^J{x^,XJ)] 

(C8) 

valid for every pair of linked loci {i,j)- Summing Eq. 
(jCSp over every pair of linked loci {i,j) we obtain 



1 = degpie)[l + 72B{e)] 



(C9) 
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where nsie) is the Bose-Einstcin distribution (jC6|) and 
5/3 (e) is given by 



(CIO) 



with xsi^) = i '^t \x\ < 6 and xsi^) = otherwise. 
Therefore the average fitness of the evolving population 
can be expressed in terms of an integral over a Bose- 
Einstein distribution with the "energy levels" to be de- 
termined self-consistently by the cavity method. 



Appendix D: Condensation transition 

An unexpected and new phenomenon can occur in this 
evolutionary process. Due to the fact that the joint prob- 
ability of pairs of allelic states can be expressed in terms 
of a Bosc-Einstein distribution we can predict that in this 
evolutionary dynamics a condensation in the same uni- 
versality class as the Bose-Einstein condensation might 
occur. In a quantum Bose gas [4l|, a Bose-Einstein con- 
densation is a phase transition at a critical value of the in- 
verse temperature /3c such that for /3 > /3c a finite fraction 
of the total number of particles is found in the ground 
state. The equivalent of this phase transition for the evo- 
lutionary dynamics described in this paper occurs when 
a finite fraction of pairs of loci gets fixed in given allelic 
configurations. Therefore, when this phenomenon occurs 
in an evolving diploid population, the number of poly- 
morphic pairs of loci is reduced by a finite fraction. 

Let us consider the case in which a pair of loci is fixed 
in the population, i.e. 



(Dl) 



Wc want to prove that this condition is equivalent to the 
condition 



(D2) 



Given Eq. we derive from the definition (jB5[) that 

the function Gij{xi, Xj) is a constant and equal to zero, 
i.e. Gij{xi,Xj) = Q ,^{xi,Xj). This result is evident if 
we observe that the only contributions to Gij {xi ,Xj), de- 
fined in (jBSp ; are given by different pairs of allelic states 
{xi,Xj) in the two parental gametes. Since we have as- 
sumed that all gametes have the same pair of allelic states 
in the genetic loci Gij{xi, Xj) = Oy{xi, Xj). Insert- 

ing this result in Eq. (jB4[) we get the following relations 



hj{xi,Xj) = Oif ^ ix*,x*) 

eij{xi,Xj) ^ iJ.if{xi,Xj) = {x*,x*). (D3) 

Similarly it is easy to prove that Eq. ()D2p implies Eq. 
(jDl[) . Therefore, if wc want to describe fixed genetic loci, 
we have to modify Eq. (jCSp , for the marginal probability 
Pij{xi, Xj) according to the following expression, 

, _^ , r l\ieij[xi,Xj) = ^ 
Pij \Xi ,Xj) <^ ^ [xi,Xj){l + ub [^ij {xi ,Xj)]} otherwise 



Accordingly, expressions (jCip and (|C3p for bij{xi,Xj) 
have to be modified in order to take into account the 
possibility that a pair of loci gets fixed. Therefore we 
have 

Uij(Xi,Xj)~^ , 



[Zi\j{xi)Z^\i{xj)] 



if eij{xt,Xj) = ^. 



The set of equations (jC8|) consistent with the normaliza- 
tion condition (jC7p is therefore modified and takse the 
form 



{1 — Uij) — y Gij{xi,Xj) + 



(D4) 



Z 



Xi , Xj 
f^ij {Xi , Xj ) ^ fj> 



valid for every pair of linked loci («,/?'). In Eq. ()D4[) the 
matrix elements Vij are taken such that Uij = 1 if a pair 
of configurations {x*,Xj) exists such that {x*, x*) = fi, 
otherwise we have Vij = 0. The study of the normaliza- 



tion equation (|D4p will define if and when the number of 
pairs of fixed loci becomes extensive. In the presence of 
a negligible fraction of fixed pairs of loci, averaging (jD4p 
over all pairs of links we get the equation (|C9p . For the 
values of the evolutionary pressure for which Eq. ()C9P 
cannot be satisfied, a finite fraction i/q of genetic loci is 
fixed and the conservation equation (|C9p has to be mod- 
ified according to 



1 



(D5) 



with (?^(e) given by (jClOp and vq defined as 



(D6) 



i.jdi Xi ,Xj 



where xsix) = 1 if |a;| < 5 and xs{x) — otherwise. As 
a function of the evolutionary pressure, a condensation 
transition can occur between a "non condensed phase" in 
which all the genetic loci are polymorphic, and a "con- 
densed phase" in which only a fraction of the genetic loci 
is polymorphic. This phase transition is in the univer- 
sality class of the Bose-Einstein condensation and it can 
be compared with other condcnsationphase transitions 
in haploid and diploid evolution 0, 0, M, Ell, [111 ■ 
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